In this document I am collecting all the information and figures that are used for naming the clusters of the Sang scWBM experiment one.
Cluster naming workflow
Using singleR to compare the cluster centroids in our analysis to reference datasets, to find the reference cell type with the highest correlation to our cell type. The datasets being compare to are: 830 microarray samples of pure mouse immune cells, generated by the Immunologic Genome Project (ImmGen, Aran et al., 2019); and 358 bulk RNA-seq samples of sorted cell populations that can be found at GEO (Benayoun et al., 2019)
Using singleR to compare individual cells to the reference cell types (same as step 1)
Using marker gene expression to either confirm or alter cluster labels generated from steps 1 and 2.
Above is a graphic from a package associated with SingleR. It is interesting but has to much going on.
The final score given to each cluster. The higher score between the two references for each cluster was the final label.
## 0 1 2 3 4 5 6 7 8 9 10 11 12
## 1000 828 813 774 541 538 451 351 218 194 191 151 71
## [1] 18
Next I’m going to use lineage specific markers to manually look at the cell clusters. List was provided by Dr. Sang.
Granulocytes: Elane, Mpo, Ctsg, Prtn3, Azu1
B-cell: Ighd, Cd20, Cd22, Jchain
Megakaryocytes: Itga2b, Gp9, Pf4, Selp, Gp1ba
HSPC: Crhbp, Emcn, Hlf, Avp, Cd34, Kit, Sca-1
Monocyte: Cd14
Macrophage: Cd45, F480, Lyz2
Erythroid: Epor, Klf1, Tfr2, Csf2rb, Gypa
T-cell/NK: Cd3g
MEP: Lbg, Nr4a1, Gpr141, Gata1
## [1] TRUE TRUE TRUE TRUE FALSE
These markers are only consistently expressed in cluster 6. Some marker expression in clusters 4, 5 and 7.
Jchain showed little to no expression in any cluster.
Expression in clusters 3 and 10, almost exclusively.
Most expression was found in cluster 12, with strong Itga2b expression in 4 and some Pf4 expression in 8.
## Warning in SingleExIPlot(type = type, data = data[, x, drop = FALSE], idents =
## idents, : All cells have the same value of Emcn.
## Warning in SingleExIPlot(type = type, data = data[, x, drop = FALSE], idents =
## idents, : All cells have the same value of Avp.
Emcn, Avp, Cd34, and Hlf showed little to no expression in any cluster.
Crhbp has mild expression in cluster 10, and Kit has mild expression in cluster 6.
Cd14 showed little to no expression
Lyz2 is widely expressed in all clusters
Epor, Klf1, and Tfr2 showed little to no expression in any clusters.
Csf2rb shows widespread expression, with highest expression in cluster 4. Gypa only shows expression in cluster 9
Cd3g shows expression in cluster 11, almost exclusively.
Gpr141 shows expression in multiple clusters. Expression of Gata1 is exclusive to cluster 4.
These are markers that I found through literature search and though looking at pangloaddb (for example top markers for HSPCs)
Many of the markers are the same between my list and Dr. Sang’s list.
## B.cell MK HSPC Monocyte Macrophage Erythroid T.cell.NK MEP
## 1 Ighd Itga2b Fgd5 Clec12a Cd14 Klf1 Gata3 Tspan9
## 2 Sox4 Pf4 Mecom Cxcl10 Ccr5 Tmod1 Tbx21 Treml1
## 3 Selp Egr1 Psap Cd5 Ank1 Rorc Cd59a
## 4 Cd47 Ncor2 ifitm3 Slamf9 Alas2 Foxp3
## 5 Gata2 Thsd1 Lilra5 Bpgm Cd5
## 6 Plk3 Nkx3-1 Mgl2 Rhag Il2rb
## 7 Runx1 Hlx Ccl12 Grsf1 Zfp683
## 8 Cfp Ilr3a Clec4a2
## 9 Tspan9 Cd33
## 10 Treml1 Itga4
## 11 Anpep
## ProliferationMarkers Myeloid Eprog Plasma
## 1 Mpo Csf3r Hba-a1 Sik1
## 2 Mki67 Elane Hbb-bt
## 3 Top2a Mpo Hba-a2
## 4 Snca
## 5 Cd59a
## 6
## 7
## 8
## 9
## 10
## 11
Clusters 3 and 10 are high in B-cell markers which is consistent through all the stages
## Warning in SingleExIPlot(type = type, data = data[, x, drop = FALSE], idents =
## idents, : All cells have the same value of Nkx3-1.
## Warning in FeaturePlot(wbm, features = mrkrs2, ncol = 2): All cells have the
## same value (0) of Nkx3-1.